In this part of the course, we will cover the following concepts:
| Objective | Complete |
|---|---|
| Discuss data visualization and exploratory data analysis | |
| Describe chart types by data and form |
The 1986 Space Shuttle Challenger explosion is an emblematic case study of how data visualization can play an essential role in decision-making
The explosion happened due to low temperatures that affected shuttle parts
Edward Tufte, a visualization expert, argues that the cause of this tragedy was an unreadable format of data given to decision-makers
The chart below was presented to the experts at the time
How easy is it to interpret the chart?
Edward Tufte argues a better chart may have prevented disaster
How easy is it to interpret the revision created?
Data visualization is an attempt to make data more easily digestible by rendering it in a visual context (e.g., charting, graphing, etc.)
We use data visualization to transform raw data into something compelling
Data visualization is at the intersection of art and science
Visual context provides insights on patterns, trends, and correlations that might be difficult to detect otherwise
It is a simple way to convey concepts and provide visual access to large amounts of complex data
Using Python is excellent as it has multiple graphing libraries with many valuable features
To provide valuable, interpretable, and relevant insights
To give a visual or graphical representation of data / concepts
To communicate ideas
To provide an accessible way to see and understand trends, outliers, and patterns in data
To try to confirm a hypothesis
Take a couple minutes to explore the dashboard
It was designed to answer various questions or user queries about the financial metrics of business operations
You can access the dashboard from the following link
Exploratory data analysis (EDA) is the process of reviewing new data to discover patterns, spot anomalies, test hypotheses, and check assumptions
It helps to create graphs without breaking the train of thought as you explore your data
Visualization is an iterative process and consists of a
few steps:
Python is a powerful tool for EDA because the graphics tie in with the functions used to analyze data
What is possible using Python?
matplotlib, seaborn)SVG, PNG, JPEG,
BMP, PDFFurther, we will explore how to visualize data using Python and perform exploratory data analysis to understand and detect the patterns
| Objective | Complete |
|---|---|
| Discuss data visualization and exploratory data analysis |
✔ |
| Describe chart types by data and form |
Deciding on what visualization type to use will depend on the data and message you want to communicate
Common data types include:
Categorical data is non-numeric or qualitative
Insight: comparisons and proportions
Chart types: vertical bar, column bar, horizontal bar, pie, bullet charts, stacked bar, and tree maps
Univariate data consists of a single numeric variable
Insight: distributions, proportions, and frequencies
Chart types: histogram, density, box plots
Bivariate data consists of two (or more) numeric variables (i.e., weight and height)
Insight: relationships, correlation, proportions, and frequencies
Chart types: scatterplot, bubble, parallel, radar, bullet, and heat
Trend data includes a time-based data (i.e., years, months, days, hours, etc.)
Insight: trends, comparisons, and cycles
Chart types: line, area, bubble, vertical bar
Text data includes alphanumeric single words or phrases (keywords)
Insight: sentiment, comparisons, and frequency
Chart types: word cloud, histogram, stacked bar chart
Geospatial data includes qualitative or quantitative information about specific locations
Insight: locations, comparisons, and trends
Chart types: chloropleth filled map, point map, connection map, isopleth map
Let’s review when to use some of the common visualizations, including:
Simple text is used when there is just a number or two to share. Simple text can be a great way to communicate something like:
Tables are helpful when communicating to a mixed audience or showing a few different units of measure
Bar charts are used to express larger variations in data and how individual data points relate to a whole, comparisons, and ranking
They express quantities through a bar’s length, using a common baseline (=zero)
Note: when the data has lengthy names, using a horizontal bar chart will make the data easier to read
Line charts are used to plot continuous data in some unit of time, such as days, months, quarters or years
They can also be used to show multiple series of data
A line graph can also represent a summary statistic, like the average and confidence level range or the point estimate of a forecast
Area charts are used to summarize relationships between datasets, how individual data points relate to a whole
The visual at the right shows the monthly trend of active operations
In chat, share your thoughts on how you think this visual could be improved
Heatmaps visualize data in tabular format, using colored cells to show the relative magnitude of the numbers
When using a heatmap, it is helpful to restrict the number of different color gradations
The visual at the right shows the busiest months ranked by the number of operations for each department
Scatterplots show the type of relationship between two numeric variables
Scatterplots are often used in scientific fields and are sometimes viewed as “complicated” to understand, but there are real-world uses as well
In chat, share your thoughts on what relationship this scatterplot represents
| Objective | Complete |
|---|---|
| Discuss data visualization and exploratory data analysis |
✔ |
| Describe chart types by data and form |
✔ |